Material for : Spectral Unsupervised Parsing with Additive Tree Metrics
نویسندگان
چکیده
] The primary purpose of the supplemental is to provide the theoretical arguments that our algorithm is correct. We first give the proof that our proposed tree metric is indeed tree additive. We then analyze the consistency of Algorithm 1. 1 Path Additivity We first prove that our proposed tree metric is path additive based on the proof technique in Song et al. (2011). Lemma 1. If Assumption 1 in the main paper holds then, d spectral is an additive metric. Proof. For conciseness, we simply prove the property for paths of length 2. The proof for more general cases follows similarly (e.g. see Anandkumar et al. (2011)). First note that the relationship between eigenvalues and singular values allows us to rewrite the distance metric as d spectral (i, j) = − 1 2 log Λ m (Σ x (i, j)Σ x (i, j)) +
منابع مشابه
Spectral Unsupervised Parsing with Additive Tree Metrics
We propose a spectral approach for unsupervised constituent parsing that comes with theoretical guarantees on latent structure recovery. Our approach is grammarless – we directly learn the bracketing structure of a given sentence without using a grammar model. The main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic a...
متن کاملIdentifiability and Unmixing of Latent Parse Trees
This paper explores unsupervised learning of parsing models along two directions.First, which models are identifiable from infinite data? We use a general tech-nique for numerically checking identifiability based on the rank of a Jacobian ma-trix, and apply it to several standard constituency and dependency parsing models.Second, for identifiable models, how do we estimate the p...
متن کاملEvaluating Unsupervised Part-of-Speech Tagging for Grammar Induction
This paper explores the relationship between various measures of unsupervised part-of-speech tag induction and the performance of both supervised and unsupervised parsing models trained on induced tags. We find that no standard tagging metrics correlate well with unsupervised parsing performance, and several metrics grounded in information theory have no strong relationship with even supervised...
متن کاملSpectral Probabilistic Modeling and Applications to Natural Language Processing
Probabilistic modeling with latent variables is a powerful paradigm that has led to key advances in many applications such natural language processing, text mining, and computational biology. Unfortunately, while introducing latent variables substantially increases representation power, learning and modeling can become considerably more complicated. Most existing solutions largely ignore non-id...
متن کاملAn All-Subtrees Approach to Unsupervised Parsing
We investigate generalizations of the allsubtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator whic...
متن کامل